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ALYSIS OF VARIANCE 



by 



Dennis V. Llndley 
University College London 



i. 



SUMMARY 

The standard statistical analysis of data classified in two ways (say 
into xows and columns) is through an analysis of variance that splits the 
total variation of the data into the main effect of rows, the main effect 
of columns, and the interaction between rows' and columns. This paper 
presents an alternative Bayesian analysis of the same situation that is 
appropriate for certain types of prior knowledge. It leads to a rather 
different treatment of the three factors just mentioned. > 
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In this paper, we consider the artalysis of data (^j^ji^) liaving the 

2 

following probability structure. , For given parameter values (^^j) ^^ij^* 

r ' ' ' * 

the random variabiles '^^^^ independent *and normally distributed with 

2 ' \ ' ' 

E(x^jj^) = e^'^ apd var(x^^j^) = o : here i ,= 1, ,2, . ... , m; j = 1 , 2 , . . . , n; 

and k = 1 , 2 , . . . , r^^ . " 

An exaimple where this model for data might be appropriate is where x - , 

* , ^ 1 J K 

is the performance of a subject in an educational test, the subject "having 

■V." , - ■ '\ 

been- to School i and College j, there being r^^ such subjects an,d the 

suffix k serving to enumerate thetn. There, 0^.^ would correspond to the 

true score of subjects from. Schoo]. i and College ] on the test, and , , 

would measure their variability. Any analysis of the dat^ would investigate 

what effects the school: and college '^attended had on performance- At firs't, 

we shall confine attention to the case where the variabiTities a^^ are al,i 

the same, equal to a^; and there are the same numbers of subjects in each 

group, so that r. . = r, say. This is usually referred to as the orthogonal 

case, and its analysis is rather simpler than that for the general situation 

which is discussed toward the end of the paper. Rather than refer to 

schools and colleges, we shall use the neutral terms "rows'' and "columns"; 

X,,, is then the k*"^ observation in Row i and Column j, the data being 
ijk 

conveniently laid out on the page in such a row and column formation.. 

Let us first recall how such data are traditlohally analyzed. Any 
good textbook on statistics tha!^ deals with the two-way analysis of variance, 
with interaction, will provide details beyond the summary which follows: 
for example, Snedecor (1956, Chapter 11)- We use the familiar "dot" 



notation for averages. Thus, x. Z x..,/nr, the mean of the data in 

* J ,k 

vRow i, the dots ^replacing the suffixes j and k over which summation has 
taj^en place. The usual analysis breaks up the total sum dI squares ai)out 
the ov^c^ll mean, ^ ^ ^ ' into at lejst tour coaiponciits . 

Firstly, tAere is t-he main- effect of rows 



\ ■ " 2 - 

nrS(x^ - X ) » (1) 



i 



Secondly^ that columhs 



(2) 



I 



,2 

mrE (x . - X , )• 

■j -y ■■ 

Th§ third is the interaction between rows and columns 

r ^ (x. . ^ ^ - ^ X + X )^ ■ (3) 

* • i • 

and the last is the residual, or within groups, sum of squares • . •. 

/ 

On division by their appropriate degrees of freedom, each of the first 
three may be tested against the last using the^famili^r F-test. If > 
exampi^^ o^ly the fi^st test is significant, then the column, and interaction 
effects are supposed zero and o^^ , for all is estimat-ed by x . _ • 
Comparisons between these means are effected by multiple-comparison procedures 
of Which Scheffejs y P^i^baps , the most popular. j 

This analysis, apart from being open to the usual criticisms that can 
be leveled against significance tests, is unsaS:lsf actory in that ±t forces , 

into the positi^^ of having to be dogmatic about whether a 'particular 
effect exists, or n^^t. Thus, several estimates of available depending 

on the results of the tests, xwo are _ (mentioned above) and 
X X 'X and column, but no interaction, effects exist)., 

^ better procedure ^^uld be to estimate the size of each of thes.^ cffect.s 
and estimate 0 a-cPordiug ly . j]^^ methods developed bolow do jusL tlu^^ and, 
for example, weight the row which O.j appe^^^s heavily If the r(^w effort 
appears to be. larg^' Significance tests are, tl^ereby, avoided. 

For the one-W^y classi f i^^^tiion, where ^(^ii^) ^ 0., such an anal vs is i^.j^ , 
been given by Lindl^y (1971) and extended to other situations in the coniext 



general theory by ^Lindley and Smith (1$72). in this paper, we apply 

the results of the latter reference to obtain an estimate of 0^^ that uses . 

^T^.> X , X . , and ^ ^ in a balance that depends on the relative sizes . 
1 J i • • • J • ' 

of the main effec'ts and interaction- In order to utilize this theory, if is 

necessary tp describe the prior probability distribution of the (Q^-j) 

(and. also a^, t/ut in the first analysis this will be supposed known). in 
» 

the One-way case, it suggested that the joint distribution niight 

reasonably have the property of exchangeability; that is, he invariant 

under ^^y permutation of the suffixes. This property is clearly inappropriate 

in the two-way case as is seen by considering the joint distribution of a 

P^ir, Q and 0 - Under exchaneeability , this distribution is the same 
ij rs ' 

for any pair of (dif^^^^^^) 9's, whereas it would be reasonable for the 
relation between (j / s) in the same row to be different from 

tbat between 9 - -and © (i ^ r) in different rows (and columns). In our 

ij 

example, knowledge of the performance of subjects at School i and Collage J 
might affect knowledge of subjects from the same school at another college, 
whereas it might s^y little about those from a different school at the 
college. We, therefore, have to express the prior ideas other than through 
exchangeability. We use, instead, a modified form of iL. 

Our prio;r opinions might lead us to think that the value of 0^^ is 
influenced both by the row and the column that it is in- If these effects 
assumed additive, we might suppose 

e.. = , H- e. H- 

where ^ is an overall mean, {^^) and (6^) respectively describe row anCl 

column effects, and ^Y^j) represent independent error terms, say, normal 

*^it,h 2:ero mean and variance o . Alternatively expressed, we could say; 

c 

2 

given u, (a ) , (B.)' ,'the 0*s are independent and normally, distributed 

i J ^ . 

with 



E(e^.) = p + + 6. . (5) 

* ♦ 

'/ i 2 

and variance c . The rows and columns might reasonably be exchangeable; 

c * 

and hence^ given u , P. , o", a , we might suppose the a's and P/s 

a 0 3> D 

^independent and normally distributed with E(a^) = y ^, ^^^j^ ^ ^b ' 

2 2 
var(a.) = a , #id var(B.) = . 
1 a J b 

This model fits convj^niently Into the framework developed by Lindley 

and Smith. In their terminology, it is a fout-stage model; the first stage 

describes the dependence of the x's on the 6's; the second, that of the 
f 

^'s on the ot's aiki 3*s; the third describes the structure of the a's and 

\ £ ■ 

B*s; an(^ a fourth sta^^.is necessary to describe the prior distributions 
of u arid p. . As in ^earlier examples, this distribution can be supposed 



a 

diffuse 
is poss 



and t^e variances for y and p, allowed to tend to infinity. It 

a b 



ble to proceed with the analysis of the four-stage , form, but it 
is convenient to reduce it first to a three-stage version with a diffuse 
prior at the third and final stage: the two analyses are equivalent, except 
for one point to be discussed later in considering the variance estimation. 

To derive the three-stage model, consider the distribution of the O's, 
j>iven M, but not the a's and, 6's. From (5), it is clear that the covariancos 



are given by 

7 



cov(0. . , G ) = 0, ' i ^ V, 2 ^ s ; (6a) 
1] rs 

2 

CQv(0 . . , 0^ ) = a , j s , (6b) 

ij is V ^ 



and 



cov(0_^. , = %^ ' i ^ r , _ (6c) 

2 '2 2 

cov(0. ,0 ) = 0 + 0^ + a . (6d) 
^ ij' ij . a b c 



(The last is just the variance of .) For example, the difference between 

(6a) and (6b) is just the distinction we were discussing above concerning 
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^ \ * ■ 

subjects from the same* School (row) i • Consequently, a second stage, 

which replaces the second and third sta^e^ of the first model,, supposes 

(6 .) has a 4«iltivariate normal distribution with covariance structure 

given by equation (6) and constant v^eary p (now incorporating and p^). 

The third (and f^nal) stage says the knowledge og^ii is diffuse. 

This is 'the model we suggest might be appropriate for some two-way 

analyses. We must emphas ize that there may well exist two-way situations 

in which the above prior specifiication - (in the second and third stages) 

is quite unsuitable. BefoVe performing an analysis of the type suRgested 

below, it must first^ be checked that the model is reasonably suitable. Our 

second- and third-stage forms are assumptions that may not always be. realistic. 

For example, suppose the rows (schools) . we^e of two types, say urban anff 

rural, then the a's (in the four-stage form) would not be exchangeable for 

all i — perhaps, only within-urban and within-rilral schools.'* ^ ' 

With this caution,' let us summarize the model: 

2 

First stage . .Given (0..), ^ ; the (x ) are normal and independent 
— ■ ij ijK 

' 2' " / • 

with'E(x..,> = 0., antJ variance o . , , 

2 2-2 

Second stage . Given M, cj^, , o^; the (^j^j) have a multivariate 
normal distribution with dispersion matrix, given by equations (6), and 
E(e_.) = u . 

Third stage . The prior knowledge of u is diffuse. 

2 2 2 2 . 

Our ^first object is,' for given a > > j and a , to find the posterior 

a D c < 

distribution of the (o^j)- it i. isy to see that it will be multivariate 
normal; the means will then provide estimates of the (e..), and the dispersion 
matrix will enable standard errors to be attached to these estimates. Vie 
later relax the conditions on the knowledge of the four variances and show 
how they too may be estimated, merely pr(>viding revised estimates 



4 



* V 



J 



and standard errors for the 0*s, Finally, we discuss the more general first 



• 2 

stage where var(x. ,) = a.,; and the number, r . , , of observations varies 

^ * . ' 

from cell to. cell. / 

• The algebraic .derivat'ic/n of the estimates .0* of 0 is given In 
Appendix ^1*. , It is ^ere shown that 

* r/a + a , , ^ 

^ , c 

' ,2 ^ ! . 

r/o , , 

(x. . - X...) 



^ / 2 ^ , /, 2 ^ 27 ^^i. . 

r/o + 1/ (a + nq ) 



c a 

i ' + 0 1 ~9~ (x , ' - X ) + X . (7) ' 

S/o^ + l/(a2 + ma^ ' . f 

This is the main result of this paper- The form of this estimate is * # 

interesting. It depends on four aspects of the data: ^^j,* mean of the 

ob^^ervations in cell (i , j); x^^^, andx , th^ corresponding row a^jckcolumn 

means; and x , the overall mean. It' is a weighted combination of this 

last, X. '-'X , the effect of the row, x , - x , the effect of the 

' ^ ' . A 

column, and x. . ^ x. - x , + x , the interaction effect. The weiehts 

. ij. 1.. .J. ... , 

2 2 2 

depend onythe variances o^, j and in addition to thS^^ residual variance 

2 ' 2 

(from the data) a . Some special cases are interesting. Suppose ^ ^ ^ 

so t.Kat , equation (5), 9^^ is a I'inear combination of the row and column 

effects and no interaction exists. Then, the first term in (7) vanishes, 

there is no contribution from the data-interaction effect, and 9^. uses only 

x^ , X ^ , and x . This is a^ extreme case corresponding to the assumed 

lack of an interaction as indicated in the usual -approach by a non-s.igni f icant 

2 2 

F-test for the interaction. If, in addition to a = t), a ~ 0, the second 

c a 

term in (7) also vanishes and only the column effect appears from the data. 



r * 



If a « 0 without a vanishing, the first and second lynm i-n (7) combine 
a c 

to give a multiple of (x,, - x , ). ThesJn results p,GnoralL/.e dn a' natural 

ij • • J • 

f 

dley (i971) for the one-way case In which similar wciRhted , 



of L^n 



way those 

combinations occurnui. Later, we^shaJl sop Iiow to est: liuatt^ i four 
variances and, hence, the weights^ y 

To Qbtaln'the posterior variances and covariartce of tfiese estimates, 



write t;he weud^ts in (7) as 



r/d 



.2 



:c y 2 ^ -2 
r/(T -f 0^^ 



r/o 



r/a^ + 1/(0^ + no^) 
c a 



3 lL 



^ r/o^ + I/(a + man 
c b 



Tiien, (7) becomes 



w (x. . - X, - K . ^ X ) -f w (x - X ) -f w (x . - ^ ) + X 



I- 



If we further put: 



nW = w - w , mW = w - w , mnW = w - w - w ^ 1 



(^)) 



and put w' = W , for symmetry, (7) can pe v>/ritten 
c c . ^ ' 



i.i 



W X, . + nV X . + mW X . + mnWx 



c ij- 



a 1 



b ^j' 



(10) 



For reasons given in Appendix 1, the dispersion matrix for; 0^.^ is paven bv 
[compare equations (6)] , 



'X 



and 



1)^^) = W,) /r, L, ^ r, \ -i H , (Ihi) 

cov(0^^. "lh^ ^ (W^^ + W)r.^/r, - j /.f^ , ^' (lib) 

r 

^'<>v(f»^^ • " *■ W)^' /r , ' I / r , • (Lie) 

^^^oh(i).. p_) + W 4- W + W)(3*^/r . ' (lid) 

1 1 1 1 a I) c . . , 



These expreH.slons aro somowhat: cum[)('Jrs()m<:» ^iinrr lUv. W's'aro fairly 

complicated, but aomf results are a little easier. For exaniploj consider 

the posterior variance ot [[^ ^ - 0.^ (j ^f* s) , that Is the di fference between 

(Columns 1 and .s In the same Kow i . It is 2 var(()j^) - 2 c(iv(0.p ^) , ) , 

which, from ( 1 iS^'^'^ti^d (Md)- is (^^^ '+ W^)fa^/r - For thtl^ means of row.s 

'(or columns), the resftits are easier still. For example, the variance of 

^^j^ " t^i^' dIfFere|:ice between two rows (schools) averaged over coluiTtns 

. / ■ ^ ^ , 

(col lepes) is (1% r , \ f s) , 

ir((), - 0 ) ^ n^^var();o, . - ) 
^' A, \ L] rs^ 



v-2 
n 



2n var(0,.) - 2n cov(0,,, 0 ,) 



, + 2n(n-l)cov(0, . , t) , ) 2n(n- I) cov/t) . . () ) 
11 is 1 1 rs 



2j 
rn 



W -fW^+W +V - (W +W) +-^(n-l)(W +W) - tn-l)W 
a 0 c b a 



from (II), and using (9), this is finally etjual ti 



10 



since 0* [equation (7)] la Che posterior mean, Clie moan of 0^ Is 



0*" , Which, from (V) , is easily seen to be 



w (x, - X ) + X ' W X,. + (1 - w )x 
a 1 • • < • . • • • a L • < a - 



a weighted average of x. and x . Had x -been used. as an estimate, 

as standard theory would suggest, then, the variance for 0^^ - 0^ quoted 

would be 2o^/rn rather than this times w , given by (12), Hence, our 
4 , a 

6st»imate is pulled toward tlie overall mean ^and h^is smaller variances when 
cpmpared with other/ values,' It fol lows that th^usual multiple comparison 
procedures, Much as Scheffe^p, are unnecessary in' our approach,' The 

J ■■ 

shift toward th^e mean and jHie reduced standard errors perform exactly the 
functi^on that these orthodox procedures are designed to provide. 

Thes<^ estimates (and standard errors) depend upon Icnowledge of the 

2 ' 2 ^ 2 ^ 

four variances o , a.., a", and a . In any application, these „are typically 

a h c 

2 

unknow<i but can bq^estlmated from the data. This is obvious for a but 

Ls 'also. true for the other4 since there is replication of rows * and columns, 

V^e -pr6ceed to discuss their estimation. :^ i 

. Llndley and Smith, in discussing the general theory, sh»w that if we' 

* are content with posterior modes for estimates (rather than posterior means) , 

we can pontinue to estimate 6,. by equations (7) provided we insert, for the 

♦ 

foiir variances, modal estimates of them. It will, therefore, suffice to 
find the posterior modes for the variances. It is inconvenient to do this 

wlttiin the context of the three-stage model because the compression of two 

2 

stages into one results in (for the original second stage) being combined 

• 2 2 < 2 2 

with c and a, (from the third stage) in expressions like ct + na^, and we 

ah . c ^ 

have the difficulties familiar in components of variance problems (or what 



11 



.is sometimes called Type II analysis pf variance) of having to estimate 

■ ' . * * ■ 

0 and o + no separately, and hence a by subtraction, so leading to the 
C c a ^ a ^ . , 

possibili'ty of negative estimates for a^, or even within the Bayesian 
framework, ' to difficult calculations. This can be avoided by using the 

four-stage model , when ,the procedure is essen-tially to estimate (a ) and 

— 2 2 2 ' ' 

(b/) and, hence, o by a multiple of l(cx^ - a*) ; similarly, a . Also, 
J a n 

2 ^ -ic ' "k "k 

a can be found from "the sums of squares of 8.. - a. - 6., [see equation 
c . , ij 1 J ^ 

" * 2 

(5>]. Finally, a can.be found, although the usual within sum of. squares 

■ ^ "^^ . . ^ 

is not enough since 6^^ is, )«/ithin the present theory, not estimated by 

X, . as is usual. Hence, the within-sum underestimates the total varidfion 

2 ' ' 

that contributes to o . All these ideas are straightf or\^?ard generalizations 
of ideas contained in the papers to which refe/rence >^ks already been made. 

The details of the calculation of the posterior modes are given in 
Appendix 2. Equations (2.3) and (2.4) providA estimates of (a^) and ((3 ) , 
respectively. Notice that only th^deviation l^rom the mean is estimated, 
which is all" that is necessary. Distinction should be made^ between the 

estimate of, for example, by [equation (2.3)] )' 

■ ' ' 2 

rna 

('^ - ^».)* = .2 ■% 2 ~ ^..^ 

rna + ro + o 
a c 

and that of 0^^ by 

2,2 
rno -f ro 

(^i. 2 ' 2 % - ^.^ 

rna + ra _ + a 

a c ' 

[from (1.18), or (7) on summing over J, and a little simplif icat ion. 1 Hie 

difference is that 0. is the average for Row i over the columns us(hI Ln the 

i • 

experiment, whereas is a similar average not confined to the columns of 

•the experiment. / In particular, ot* is shrunk more toward the overall mean^ 



than is 6* , since the coefficient of the deviation (x. * - x ) is smaller 

• .1.. .^v^ 

in the former. , 

Equations (2.5) provide the estimates of the variances, using the 
estimates for (a.) and (B.) just obtained as well as those for (^V^^-) already 
calculated. Those estimates, in turn, depend on the variances, and so some 
iterative procedure has to be used. We suggest the following: Obtain 

r 

initial estimates of the four variances from the usual analysis of variance 

expressions, expre8Sd.ons (1) to (4), divided by -their respective degrees of 

freedom. These will be unsatisfactory estimates but will serve to provide 

weights to be used to estimate the 0 Vs [equation (7)] and the a's and B's. 

With these estimated, new values for the variances can be found from equations 

(2,5) and the cycle repeated until convergence. 

Notice that the estimates (2.5) involve quantities derived from the 

prior distributions of the variances, o There is no objection to putting 

-V, corresponding to o " , equal to zero; but the remaining vcjilues , v^^ , 

2 ^2 2 

cannot be ignored. The difficulty is that if a^, , or are small in ^ 

9 2 
comparison with (or more correctly a /r) , there is little information 

in the' data from which to estimate them since the variation in the (^^^j.) 

mostly clue to '^'^ . \ w this case, the prior knowledge is clearly important 

2 ■ 

and so naturally arises in any estimation procedure. If o^, for example, 
is large; its estimation is easier, and in (2.5b), the sum of squares for 

(i* will dominate v \ unless the latter is large: that term and n in the 

i :\ 

denominator may be ignored. 

Whilst the estimates for 0^., given the variances, are almost certainly 
satisfactory; it may be possible to improve the estimation of the variances 
in comparison with the methods given in this paper; and we. hope to study the 
problem in more detail later. In the meantime, it miglit be reasonable (o 



13 



guess that the term mn' in the denominator of (2.5d) might he replaced by 
the degrees of freedom (m - 1) (n - 1). In deriving modes, rather than 
means, the usual integrations that remove degrees of freedom do not take 
place, and hence, the divisor always involves the total number, here mn , ^ 
of parametere. Another way of looking at it is to appreciate tha*t the 
modes of marginal distributions are not the components of the modes of 
the whole distribution. . ^ ' 

The discussion has so far been confined to the case where there is 
the same number, r, of observation in each cell. Suppose now that there 
are r . observations in the cell in the i*"^ row and j column. tn this case 

it is not possible to obtain simple expressions for the estimates 0 . as in 

<? L J 

equation (7). Instead, we have tD be content with linear equations for 
them which can then be solved numerically in any particular case. The 

estimates of ia ) and ( i< ) follow with minor modifications as do the 

i J ^^-^ 

estimation of the variances. Details are given in Appendix 3. 



rhe Last generalization we make is to the case where the within-cell 

2 



2 

variance o is not const^int. In most applications, will not be 



2 

large enough to effect a good estimation of o^^; but if the latter are 
assumed connected in some way, then sensible estimation mav be possible. 
We have been able to make progress in the case where all the (^^'— ) 
exchangeable. Ideally perhaps, one could make a modified exchangeability 
assumf)tion as we have witi\ the means, but I have not been able to develop 
a satisfactory procedure. Details with the full exchangeability assumpticni 
are given in Appendix 4. Appendix 5 summarizes the calculations required 
in the general case. Finally, Appendix 6 provides a simple numerical example 
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APPENDIX 1: Posterior Distribution of ,the Cell 
Means Assuming the Variances Known 



When writing out^ vectors of elements 'depending on two or more suffixes, 
we shall use a lexicographical order: thus, < 

^Xr \r "m- ^21' ^22' V • .- , 

» 

The three-stage model is exactly in the linear framework developed by Lindley 
and Smith, and their corollary 2 [equations (16) and (17)'] shows that the 
posterior distribution of (O^j) is normal with first and second moments 
there stated. Their notation is 

if 

First stage - E(x) = ^x^l' <^i^P^^sion matrix . 
Second stage . E(e-j^) = '^2^' dispersion matrix • 
Then, the posterior distribution of is N(Dd, D) with 

and 

d = A^C~^Xi. . (1.2) 

l/e proceed to evaluate (l.I) and (1.2). The matrix is given in equations 
(6): thus, the element in the row corresponding to 0^^ and column corresponding 

to 0. (i 7^ s) is o^, and others similarly. The inversion required for (1.1) - 

Ls a 

is most easily accomplished by solving the equations in z, C^z = a . VJritten 
out in full, these are 

2 ^ 2 ^ 2 ' (^ 
(J z.. -I- na z. -I- ma, z . = a . . , (L.J; 
c ij a 1. b ..1 ij 

• J 

^ using the "dot" notatLon. Summing over i and j, we have ^ ' 



2 2 2 

(a + na + ma^)z 
c a b • 



or 



where 



z . = a , . /v 



mn 
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(1.4) 



2 2 2 

V = 0 4r na + ma, 
mn c a b 



(1.5) 



Summing (1.3) over j, we similarly obtain 



2 2 2 

(o + na ) z . + ma, z 
c a 1' b ' 



which, on usinR (1.4), can be written 



where 



z . 
1 • 



(a. 



mo a /v )/v 
b • • mn n 



(1.6) 



. 2 , 2 
►a + na 
c a 



(1.7) 



S imi iar ly , 



where 



(a. . 

J 



na a /v )/v 
a • • mn m 



(1.8) 



V = 



2 ^ 2 
c b 



Substitution of (1.6) and (1.7) into (1.3) gives 

1 

-1 



z. . = a 
i ] c 



.2 2 

na . ma 

. . - (a. - ma a /v ) (a . - na a /v ) 

11 V 1. b mn V -i a •• mn 

n m ' 



(1.10) 



Since z = ^^^"^ identification of tenns on the right-hand shows that has 

the same structure as itself [equations (6)]. For example, all the terms 

in rows (i, j) and columns (r, s) with i r , J ^ s are the same. From (I. 10) 
the terms are 



1.6 

i ^ t/j ^ s: -41 , (1.11a) 
i = r, 3 ?4 s: f + h , (1.11b) 

i ^ r, 3 = s: g + h , . (1.11c) 

i = r, 3 = s: . e +'f + g + h ^ ^(l.lid) 



where h is the coefficient of mna in (1.10). That is, 



2 V V 

0 V \ n m 
c mn 

f is the coefficient of na, in (1.10); namely, 

1 • 



2 2 

^ ' ^ , . (1.12a) 



a c n 



Similarly, g is the coefficient of ma ,, so 

•J 



f = _ o^/c^v . (1.12b) 



2 2 

= - o^/o V , (1.12c) 
b cm 



and e corresponds to a^^ ; namely^ 
■» 

e = . (1.12d) 

c 

We note for future reference that summation of (1.10) over i and j gives 
z = a *(e + nf + "mg + mnh) so that, on comparison with (1.4), 

, e + n f + mg + mnh = v . (1.13') 

mn 

Having evaluated C^^-^w^ now return to (1.1). is easily seen to be a 

^ ' X -1 ' 

vector, all of Whose elements are unity. Hence, ^2^2 ^ (row) vector, all 

of whose elements are e + nf + mg -I- mnh = v [from (1.13)1. Hence, 

mn 

T -1 -1 -IT -1 --1 T -1 

A_C^ = mnv . Simple calculation shows that C_ A^(A_C_ A^) ' A^C^ is a 
^2^2 ^ 2 mn ^2. - Z ^ I ^ 1- 1 

matrix, every element of which! is (mnv ) ^ . 

^ mn 

\ 
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T -1 

Reference to tb^ first stage of the model shows easily that A^C^ 
is a dlla^onal matriX^it^ every diagonal element equal to r/o^ . Consequently, 
P ^ Inequation (1.1)1 matrix of the same form as c^^ [equations (1.11)];' 

but With e replaced hy e + r/a^ = , say, and h by h (mnv^^)""^ = ^' ' ^^y. 
The values of f and ^ unaltered. Further consideration of the first 

^tage of the model shows that d is a vector whose d, j) element is x^^ r/o . 



^f 9* denotes the estimate of 0 . , that is, the posterior mean of 
ij iJ >^ 

^^^^^^ joint di^ibution; the corollary quoted about shows that 0* " P^' or 
9 "^9* = d . Inserti^S the values of d""^ and d just obtained and writing 
these equations out full, we have ^ 

. ' e'e*. ^ ^^^t ^ mpO*. + mnh'O* = x ^/a^ (i-U) 

I 

[compare equations (1-3)]. These equations are most easily solved by 
writi^^p^ 







= 0* - 


0* - 0*. + 

y. 


i 












= e*' - 

— — ^ i • 




i 




(1. 


15) 




. 

•J 


= o'^ - 
•J 












an d 




















ij • 






/ 








V . 

I • 


i • • 


- X ^ , 






(I. 


16) 






= X 

• j • 


- X 




i 






We car> 


then rewrite • 


as 


















mg)rt.*. + (e' 


+ nf -f + mnh' 










y,. ^ X. 








(1 


17) 
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2 ~ 1 

We note, from (1-13), the Jact that; e' = e + r/o , h' = h - ("^"^mn^ ' 

that e' + nf + mg + mnh* = r/o^ . 

Summation of (1.1^) ^^er i and j gives 0* = x , over j alone gives 

• • • • 

2 

+ nf)({)* = y. r/o or 



^* = . - y. ■ (1.18) 



i. - 2^-1 'i' 
r/o + V 

n 



J 2 
on inserting the values for e' = e + r/o , fe, (1.12d) and f, (1.12b). 

Similarly, ^- 

r/a + V 

m 

and inserting these values into (1.17). 

\ (},*. := l/^? V. . . (1.20) 

r/o +0 

c 

Returning to the ori?,'^^^^ ^orvx in terms of 0*^ and x^^ , we easily obtain the 
expressions given in (^)' 

The dispersion matrix for these estimates (that is, the dispersion matrix 
of the posterior -n^^"^^^ distribution) is, by the corollary, D . The equations 
just Solved are 0* = » so D may be found by taking the coefficients of the 

elements x r/a'", of d in the solutions. For example, to obtain the 
i- i • 

if 

covariance of 0* and 0 with i ^ r, i s, it is only necessary to take the 
i i ^ 

coef fi^^ie^t: of x r/^ in the expression for 6*. . In the notation given by 
\ r s • 1 J 

(S) and (9), this is easily to be seen from (10), W since x^^^ only occurs in 
^...» these with coefficient W . All the expressions given in equations (11) 
can obtained in the same way. 



) 
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APPENDIX 2: Estimation of the Variance Components 

In the four-stage model, descried by (5) and the following sentence, 

the joint probability disti;ibution of all the random, quantities (x^^^) , 

(ct^), and , after integration with respect to the diffuse -priors 

of p , p and , is easily seen to be proportional to v, 
a b 



-mnr -mn+l -m+1 -n+1 

a 0 a a exp 

c a b 



X exp 



a i,j i,j 



- a. + a - B + e )' 
1 - j . . 



0 i J 

a b 



(2.1) 



There, the total sum of squares for the data has been broken into the two 
components within- and between-cells . Differentiat ion with respect to the 
e's, a's, and 6's, and equating the results to zero gives modal estimates 
for these parameters. It is nbt difficult to verify that for 6^^ is exactly 
0*. given by the three-stage model in equation (7). We proceed to find the 
corresponding modes (a*) and (6*). The result of differentiating (2.1) with 
respect to (x. is easily seen to be 




(2.2) 



where b =0 - J [cf (1.15)]. Equating this to zero and using the 
i • i • ' ' 



estimate of '([> . [equation (1.18)], we easily obtain 



rno 



rno + ra -fa 

a c 



2 . 2 



(2.3) 



S imilarly , 



rmo, 



2 . 2^2 
rmo, -\- TO + o 

b c 



(x 



X ) 



(2.4) 
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: . 2 2.2^2 

modal estimates of the four variance components a , o^, , and 



With thes,e estimates, it is an easy matter tQ ob,tain equations for the 

Suppose 

these have independent prior distributions which are all inverse - • 
Specifically, let 



vX 



% Y 

2 '^v 



2 t t 



2 . 



(t = a, b, c) 



\ 



Multiplication of tli<^ distribution (2, 1) , by this prior, gives the 
posterior distribution of. all the parameters , including the variances, 
apart from constant factors. \ The modal equations for the variances arc 

straightforward since the expression factors into four parts, each depending 

2 2 
un one of the variances. The results are (we use s for an estimate of o 

rather than the asterisk notation used with the other parameters) 
2 



* 2 

vX + S + r (x. . - 0 .) 



/ 



(mnr + v 4* 2) 



a a . 1, 



b b . ' 
1 



a*)^ /(m + + 1) 



R*)^ /(n -f V -f 1) 



() - a + a - + I' ) 



/ 



(2.5a) 

( 

(2.5b) 



(2.5c) 
(nin + V -f- 1 ) , 

(2,3d) 



where S 



2 \ 

I , the usual wi thin-ce 1 1 s sum of squares. For 
reasons given in the main text, mn in the denominator of (2,5d) can probably 



(x . - x , . 



he replaced by (m- l)(n- 1) 



• \ 
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APrtlNDIX 3: Unequal Numbers of Observations In the Cells 



In -this appendix, we consider the case where cell (1, j) contains r^^ 



observations, not all equal. ■ SincQ the change f ron^^imstant r^^ = r only 

affects the first stage of the model, the calculations in Appendix 1, on 

T -1 * 

the second stage, are unaffected. However,. A^C^ A^ will be a diagonal 
matrix with diagonal entri^ ^ij^^^ ' result will be that D ^* will not 

have^ a yconst^nt diagonal entry; and e in C^"^ will be replaced, not by 



f i<^/^ - - 

read 



/a , but by entries e + ^ij^"^ ' Equations (1.14)\will, therefore, 



(e^+ r /a^)0*. + nfO* + rngS*. + mnh'O* - x.. r../o^ . (3.1) 

T IJ IJ 1' 'J • • IJ • 1.1 

It does not seem possible to write down the solution to these at all simply 
and resort must be had to numerical calculation in. any particular case. The 
matrix on the left-hand side of (3.1) is the inverse of the posterior 
dispersion matrix, 0, and this too will have to be found numerically. It 
is not, therefore , "^ssible to give formulae for the variances and covariances 
of , generalizing ^eq^lations (11). 

With the 0 ' s .estimated , the argument leading to (2.2) is unaffected and 
the 'tVs may be found (rom 

2 ■ • ^ 

(a - a,)*-^= / ^ L (Q^,^ - e..)* . (3.2) 
c a 

Similarly, 

» " 2 

ma 

(H, - = (e - 0..)* . ^ (3.3) 

- c b 

Note that equations (2.3) and (2.4) are no longer available. 



* ) ■ p 
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2 2 2 • . . . 

Finally, the estimation of o , a, , and o [equations . (2. 5b^d) ] is 

3. D C v 



unaltOTed, but the new estimation of o is 



2 

s = 

\ 



w ij ij 



(R + V -f 2) 



(3. A) 



replacing (2.5a). There, R = ^ E r. , . . > * 

Notice the nonorthogonality .problems th^t arise in the usual approach — 
for .example , the nonindependence of sum of squares — do^'s not matter here. 
Nevertheless, the complicated form of the poj^erior dispersion matrix does 
make it much more (difficult to describe and understand the analysis ; and for 
this reason, the balanced design is much to be preferred. 



ERIC 
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■ „ • ■ * 

, - APPENDIX 4: Exchangeability and Variance Estimation 

The main part of this appendix is virtually independent of the rest of 
« 

the paper, but the results bhtaincd therein are then applied to the two-way ' ' 

2 

analysis when the within-^cell varianc,es a^, are not constant. 

j / ' ■ 

The estimation of variancefs has been, discussed by T.indley (1971)^ but 
.the analysis there contains some flaws; and we, therefore, approach the . 
problem afresh. The simplest case of variance estimation is where there ate 
m independent samples > each from a no^al distribution of known, zero mean 
but unknown variance. Let the i^^^ sample have variance (f)^ and denote the 

data sum (>l -.(jiares al)out the mean (i.e., zero) hy S^; this Will have 

^ 2 ^ 

tiejirees of freedom where n, is the size" of that sample. The (S . ) form a 



.set of"" suf f i c i ent statistics, and the iikelihoo6 for the data is proper t i^ona J 

to 7' 



exp 




(4.1) 



Suppose^ iu)W t:hat the prior opinions of the variances are that lliov are 
excliangeab I e . One wav of achieving, this is to suppose the.Ci}).) themselves 
a rand(Mii sample ^r/nm s(wie distribution: indeed^ if the exchangeai) i 1 i t v is to 
hold for^i'verv m, then Ihns is tlie only way to achieve it. It is couvtaULMit 
to suppostyt his distriiualon to be of the form conjugate to (4. I) » nannel.y. 



inverse -x^ - Sper i f i cn 1 1 v ^ we suppoj-;e thr disciribut ion of |. to he such 

that, for ^iveu v and a'', \)0 / <\) ^ is x with v degrees of freech^n. Tlierc^^ 

2 ■ 2 ■ ' J \ 

/and o" a're hyper parameter s , <) bein^* a measure of location for ((^ . (iuul ^ 

therefore, hy the e H:c!ianjL»eab 1 1 i t y , ot every variance) and v measuring; tlie 

precision of that distribution. The prior distribution of the 4^'s, )t!;iven 

the hyperparameters , is tiierefore, 



> 

^ , r' 
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m 

n 

i=l 



, V0 

exp \ - H-r 



This may be rewritten 



exp 



mvcr 



«l/here 



(vo ) 

'^v + 1 



y 1 



2^^^^ - d: 



m 



(4.2) 



and 



1 m 



II (b. = G 
1 



J 



(4,3) 



so that G and II are, respectively, the^geometric and hamonic means of the 
variances we are trying to estiinate. ^ • 

2 

The next stage, is the aissignment of a prior distribution for v and a 

In the earlier paper, equation (16) ^of Lindley (1971), we assigned a distribution 

of a"^, given v; thus, making these two dependent. It seems more natural to 

'\ * 

think of them Independent since they measure quite different features of 

2 2 
the distributions of the it^s. Suppose then that Xo^^ls distributed as x 

■on degrees of freedom, 6 and A being known cons tant^values ,^ independent of 

■ » ^ 2 ' ' 

\' whose distribution will be discussed below. Since the mean of Xo is 6, 

Is our prior estimate of any • The value of 6 reflects the p^cision 

. 2 ' 

attached to this estimate and would usually be small. The density pf a is 
then proportional to 

- 1 



exp 



(^ ) 



(4,4) 



We have to multiply (4.2) by (4.4) and integrate the result with respect 

2 ^' 2 

to a , The only terms ^hat involve o are 



25 



and the Integration gives 

2^(v^+ «)[J5(vm+ 6) - 1]! • 
^ H 

Restoring the terms. r.,go far omitted flfci (4-2), we get, apart from constants , 
» . ■ W ■ " T ■ 

' ■ ■ I 

P^(vm + 6) - 1]I -v^^ 1 (4.5) 

This complicated expression can be simplified usin^ Stirling's formula for the 
factorial function. Its most convenient form for our purpose is . ; ^ 

log(av + b)I > c + .(a log a - a)v ■»- av log v + (b + ^)log v 

for constants a, b, and c . . The logarithm of (4.5) is then, apart from a 
constant which does not involve the ())'s, and omitting terms of order v ^ 

%m log(H/G).v + %(m -^l)log v - (m log G + ^$XH - ^6 log H) . 

Hence, (4.5) is, approximately, equal to 

/ 

exp(- '^m log rr .v).v e G . (A. 6) 

H 

Finally, suppose v has a prigr density .proportional to' 

exp(- l5X'v)v^*' " ^ . (4.7) 

The product of (4-6^ and (4.7) Is then, easily Integrated with respect to v to 

.give 



~ -^T— r-rr . (4.8) 
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On multiplying by the likelihood, we have finally for the posterior 



distribution of the ((J)^) a value proportional to 



exp 



'-5(m - 1) + ?55' 



-1 



C4.9) 

We proceed to find the modes of this distribution and to use these as 
estimates of the variances. Taking logarithms and differentiating (4,9), we 
have 

*i m<t>l ^ ' m<t>l *i ^ ""h 



\ 



- [him - n + 46'] 



(m log ^ + X') ( <t>l 



(In obtaining this result, the derivatives 



H 



and 



i m((). 



3G 



(^(b- met). 
1 1 



which are easily verified, have been used.) Consequently, the estimate 
vOf (|)^ is given by : 



m log - + X' 



m log - + X 



(4.10) 



This rather complicated expression for (f)^ can be simplified. If we put 

X = 5 = 0, we are [equation (4.4)] effectively assuming that we have little 

2 

prior* knowledge of a , and we have the usual prior for a variance proportional 

t?) a . This causes no convergence problems in (4.10). We cannot do the 

same for v [equation (4.7)], but 6' = 1 will simplify things a little [for then. 



'-CO 
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*i(m - 1) +456' = Jsra], while avoiding convergence problems and yet repre'senting 



diffuse knowledge of v . (4.10) then becomes 

(n^ + 2)8^ + ^ ^ 



log ^ + X" 



(4.11) 



(n^ + 2) + 



log ^ + X" 



where X" = X'/m and = S^/(n +2).. 

Tfie form of (4.11) is informative. is a weighted average, of the usual 

2 * • 

estimate, S, (apart from a divisor n_, + 2 instead of n.) and the liarmonic m^an 
i ^ > i 1 

* 2 

of the (t)*s. (In this mean, ve can conveniently replace (f)^ by ,) Hence, 
we see that the estimates are pulled toward the harmonic mean Just as the 

estimates of means move to the arithmetic mean. The weight attached to the 

G 

mean is the reciprocal of (log — + X") and increases as the geometric and 
harmonic means become more disparate (note that G ^ H) . It is not possible 
to let X" = 0, since then, infinite weight is attached to the mean value with 
G = H. 

Now, let us apply these results to the two-way analysis, of variance. In 

the four^stage model, the probability distribution will b^' as (2.1) except 

2 

that the therms involving a will be replaced by 



Ha . exp 



h Z (x,\, - X, . )^/a?. -Hi (x. . 
i,j,k i,j 



) . . ) T.Jo., 



, (4.12) 



On writing. 



(4.13) 



this becomes 



i;3 



ij 



exp 



(4.14) 



2i) 
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This is a likelihood of the same form as (4»1) with (bearing in mind the 

2 2 2 

double suffixes) o^^ for (|)^, ^ij^^ij^ ^ij * - 

2 

We now suppose the a^_. to be exchangeable. This may not be appropriate 
because it fails to exploit the row and column structure of the layout; but 

as a first approximation, it might be reasonable. If we do this, the 

2 

appropriate estimate of a is given by the equivalent of (4.11): that is, 

s;.=-^J— (4.15) 

with 

p = log - + X , 

2 

G and H being, respectively, the geometric and harmonic means of the (s^j). 



-1 



31' 



29 



APPENDIX 5: Methods of Calculation 



In this appendix, we describe in sunnnary form the»8tep8 to be carried 

2 

out in calculating the estimates for the general case of unequal r^^ and a^^ 
(1) Calculate from the data the basic statistics, (x ) and 



. Insert prior values for A' (4.7)--A'* (in 4.15) is A'/mn~ 



v^, ^^(t = a, b, c) [for equations (2.5b-d)], 

2 2 2 

(2) Calculate initial estimates of a^, , a^, and i po^ecJ^stimate 

2.2 , ^ ^ 

s of using ^ 



and 



s = rnE(x^ 
a . "^ i' 

1 



X )V(m - 1) , 



2 * 2 

s. = ntiEfx . - X ) /(n - 1) , 
b . ' 1 • ' ' ' 

J 

2 2 
s^ = rE(x^. - X. ' - X . + X ) /(m - 1) (n - 1) , 
c ij ' !• ' ' J ' ' ' ' . 

1 > J > J- , J 

2 2 2 2 
(3) With these estimates replacing a^, a^, a^, and a , solve equat 

(3.1) for e*^. . In these equations, 

-2 ^ 2/2,2^ 2. 2; 2, 2 ^ 2 ' 

e = a , nf = -no /a (a -f no ) , mg « -ma, /a (a -f ma, ) 
c acc a bcc b 



ions 



and 



rnh' = 



r 2 2 

mna a, 
a b 



1 + • 1 



a 



2 ^ 2 ' 2 . 2 
a "f na a + ma, 
c a c U 



2 2 2 
(a + na + ma, ) 
c a b 



it ^ 

(4) Still using these estimates, find (a^ - a ) and (g^ - 3-) ^frdtn 
equations (3.2) and (3.3). 



(5)' , With e*^ replacing 6^^ , calculate S^^O^^), equation (4.13) and, 

2 . 2 



^ij^^ij 

2 2 

hence, initial estimates, ^ from (4.15). In this last formula, use 
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G and H as the geometric and harmonic means [equations (4.3)] of 

2 2 2 

(6) Calculate revised estimates of s , s- , apd s using. equations 

a u c , 

(2.5b-d). 

^ 2 2 2 

(7) With these new estimates of a , a, , and a and the estimates of 

a D c 

2 '2 

; resolve equations (3.1) except that a is replaced by the estimate of 

2 2 * 

a^j where a divides and x^j 

(8) Repeat (4) using the new estimates for 0^^ . 

(9) Repeat (5). 

(10) Repeat (6). 

Repeat (7)- (10) until the result^ converge. 

Notice that in the final solution of (3.1) — stage (7) — the matrix whose 

r 

•inverse is effectively obtained is the dispersion njatr.ix of the (0, .) and 

'to 

should be made available":^ 
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APPENDIX 6: A Numerical Example 

' >In this appendix, we describe the results of analyzing a simple case 
using the methods developed in the paper. Richmers and Todd (1967) give 
the following data, in their Table (8.21), taken from an experiment, on the 
breaking strength of three fabrics at four temperatures with two replicates 
at each of the twelve combinations. We, therefore, have the tase of constant 



Fabric 


210 


Temperature 
215 220 


225 




1.8 


2.0 


4.6 


' 7.5 


A • 












2.1 


2.1 


5.0 


7.9 




2.2 


4.2 


5.4 


9.8 


B 












2.4 


4.0 


5.6 


9.2 














2.8 


4.4 


8.7 ■ 


13.2 


C 












3.2 


4.8 


8.4 


13.0 



2, ' 
numbers of replicates, and we assume that a^^ is also fixed but unknown 

2 ' ' ^ ■ ' 

at a . We, therefore, have the simpler situation discussed in the bulk of 

the paper. The prior distribution suggested therein seems appropriate 

except that exchangeability of the column values (temperatures) ignores the 

fact that they are in sequence. But such information on ordering is 

■ ; . ' ^ " . ./ ■ ' % ' ■ ■ 

, neglected in the usual analysis of variance technique, so we d<^ the same 

for comparison purposes* In the standard method,, the 3 degrees^vCjf freed 



om 



associated with temperature would be broken up into linear and peth;^ps , 
quadratic terms i a parallel Bayesian analysis could easily be developed. 
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We took V » 0, " " ^ " ^* ^* equations (2.5). These correspond 

to weak prior knowledge without causing convergence problems. (Values « 3 
were also tried with only a small effect on the results.) 

The next table gives for each of the 12 cells the estimate S^^y of the 

f 

cell mean obtained from equation (7) with estimates from (2.5) of the variance 

-■ ^ ^ 

components replacing the a's. Also, included in brackets is the mean of the 
two original readings for that ^cell for comparison purposes. For each row 
and column there are similarly given the estimates and 3^^ from (3,2) and 
(3.3) together with the data means in brackets for comparison. 



Fabric 


210 ^ 


Temperature 
215 220 


225 






1.39 , 


2.41 


5.11 


. 8 . 80 


4.31 


A 














(1.95) 


(2.05) 


(4.80) 


(7.70) 


(4.13) 




2.24 


3.49 


6.02 


9.85 


5 .38 


B 














(2.30) 


(4.10) 


(5.50) 


(9.60) 


(5.38) 




3.63 


4.85 


7.72 


11.62 


7.10 


(? 














(3.00) 


(4.60) 


(8.55) 


(13.10) 


(7.31) 




2.53 


3.66 


6,26 


9.95 





(2.42) (3.58) (6.28). (10.13) 



* 2 2 2 

The estimates of the variances are s » 0.495, s = 0.991, s, = 5.591, 

a ' D 

2 

s « 0.098^ Thesie show a large effect of temperature, a smallet effect of 

* 

fabric^ and a small interaction term. The estimates 6^^ are, therefore, 
dominated ^ the additive effect of the two factors. These, displayed in the 
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■ ■• ' ^ ■ ^ 

borders of the table, show the usual shift toward thB overall mean. For 

* 

example, the value of B^, the mean breaking strength at 210 is 2.53, greater 
than the observed mean of 2.42.. The shift with the cell means is greater 
because of the almost complete removal of the interaction component. Thus, 

fabric A at 225 is estimated at 8.80 against, an observed value of 7.70 

( 

which is a shift away from the mean. Notice that as a result of these 
shifts, the estimate of residual variance is at 0.495, much. larger than the 
conventional value of 0.056 obtained from the 12 within-cell differences. 

I am most grateful to David Christ and Gerald Isaacs who wrote the 
computer program and ran the above example. Their enthusiasm and expertise 
was most helpful and provided an illuminating Insight into the merits of 
interactive computing. 
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